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SPEECH CODING APPARATUS AND SPEECH DECODING APPARATUS 

Technical Field 

The present invention relates to a speech coding 
apparatus and a speech decoding apparatus using speech coding 
algorithm at low bit rates , used in digital communications such 
as a portable telephone. 

Background Art 

Speech compression coding methods at low bit rates have 
been required in order to accept an increase of subscribers 
in digital mobile communications such as a portable telephone , 
and the researches and developments have been proceeded by many 
research institutions. In Japan, applied coding systems as 
a standard system in portable telephones are VSELP at a bit 
rate of 11.2kbps developed by Motorola and PSI-CELP at a bit 
rate of 5 . 6kbps developed by NTT Mobile Communications Network, 
INC. , and portable telephones with these system are produced. 

In addition, internationally, the ITU-T selected CS- 
ACELP, which was co- developed by Nippon Telegraph and 
Telephone Corporation and France Telecom , as an international 
standard speech coding system G.729 at 8kbps. The system is 
scheduled to be used in Japan as speech coding system for 
portable telephones . 

The above -described systems are all achieved by 
modifying the CELP system (Code Excited Linear Prediction: 
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M.R. Schroeder "High Quality Speech at Low Bit Rates" described 
in Proc.ICASSP '85 pp. 937 -940). A feature of this system is 
to apply a method of dividing a speech into excitation 
information and vocal truct information, code the excitation 
5 information with indices of a plurality of excitation samples 
stored in a codebook, while coding the LPC (Linear Prediction 
Coefficients) with respect to the vocal truct information, and 
perform a comparison to an input speech considering of the vocal 
truct information in the excitation information coding (A- 

10 b-S: Analysis by Synthesis). 

The basic algorithm of the CELP system will be described 
using FIG.l. FIG.l is a block diagram illustrating a 
configuration of a speech coding apparatus in the CELP system. 
In the speech coding apparatus illustrated in FIG.l, LPC 

15 analyzing section 2 executes autocorrelation analysis and LPC 
analysis on input speech data 1 to obtain the LPC. LPC 
analyzing section 2 further codes the obtained LPC to obtain 
the coded LPC. LPC analyzing section 2 furthermore decodes 
the obtained coded LPC to obtain the decoded LPC . 

20 Excitation generating section 5 fetches excitation 

samples stored in adaptive codebook 3 and stochastic codebook 
4 (respectively referred to as an adaptive code vector (or 
adaptive excitation) and stochastic code vector (or stochastic 
excitation) ) and provides respective excitation samples to LPC 

25 synthesis section 6. LPC synthesis section 6 executes 
filtering on two excitations obtained at excitation generating 
section 5 with the decoded LPC obtained at LPC analyzing section 
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2. 

Comparing section 7 analyzes the relation of two 
synthesized speeches obtained at LPC synthesis section 6 and 
the input speech, obtains an optimal value (optimal gain) for 
5 two synthesized speeches, adds each synthesized speech 
respectively subjected to power adjustment with the optimal 
gain to obtain a total synthesized speech, and executes a 
distance calculation between the total synthesized speech and 
the input speech. Comparing section 7 further executes, with 

10 respect to all excitation samples in adaptive codebook 3 and 
stochastic codebook 4 , the distance calculations between the 
input speech and each of other many synthesized speeches 
obtained by functioning excitation generating section 5 and 
LPC synthesis section 6, and obtains an index of the 

15 excitation sample whose distance is the smallest among the 
obtained distances. Then, comparing section 7 provides the 
obtained optimal gain, indices of excitation samples of 
respective codebooks and two excitation samples corresponding 
to respective index to parameter coding section 8. 

20 Parameter coding section 8 executes coding on the optimal 

gain to obtain the coded gain and provides the coded gain , the 
coded LPC and the indices of excitation samples to transmission 
path 9. Further, parameter coding section 8 generates an 
actual excitation signal (synthesized excitation) using the 

25 coded gain and two excitations corresponding to the respective 
index and stores the excitation signal in adaptive codebook 
3 while deleting old excitation samples. 
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In addition, it is general for the synthesis at LPC 
synthesis section 6 to use together Linear Prediction 
Coefficients and a high-frequency enhancement filter or a 
perceptual weighting filter with long-term prediction 
coefficients (which are obtained by the long-term prediction 
analysis of input speech) . It is further general to execute 
the excitation search on the adaptive codebook and stochastic 
codebook at an interval (called subframe) obtained by further 
dividing an analysis interval. 

The stochastic codebook will be described next. 

The adaptive codebook is a codebook for an effective 
compression using a long-term correlation existing at 
intervals of human vocal cord vibrations, and stores previous 
synthesized excitations. On the contrary, the stochastic code 
book is a fixed codebook to reflect statistical 
characteristics of excitation signals. As excitation samples 
stored in the stochastic codebook, there are, for example, 
random number sequence, pulse sequence, random number 
sequence/pulse sequence obtained by statistic training with 
speech data, or pulse sequence with relatively small number 
of pulses generated algebraically (algebraic codebook). The 
algebraic codebook has been especially paid attention recently 
and known by that a good sound quality is obtained at bit rates 
such as 8kbps with small calculation amounts . 

However, an application of algebraic codebook with a 
small number of pulses to coding at lower bit rates introduces 
a phenomenon that sound qualities greatly deteriorate mainly 
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on unvoiced consonants and background noises. On the other 
hand, an application of excitation with a large number of pulses 
such as random number sequence to coding at lower bit rates 
introduces a phenomenon that sound qualities greatly 
5 deteriorate mainly on voiced speeches. In order to improve 
the deterioration, a method with multi-codebook, in which a 
voiced/unvoiced judgement is performed, is examined . However, 
the method has the complicated processing and sometimes 
generates an allophone caused by a judgement error on a speech 
10 signal. 

As described above, there has been no algebraic codebook 
which matches any effective coding on voiced speeches, 
unvoiced speeches and background noises. Therefore, it has 
been required to obtain a speech coding apparatus and a speech 
15 decoding apparatus capable of effectively coding any of voiced 
speeches , unvoiced speeches and background noises . 

Disclosure of Invention 

An object of the present invention is to provide a speech 

20 coding apparatus and a speech decoding apparatus capable of 
effectively coding any of voiced speeches, unvoiced speeches 
and background noises and obtaining speeches with excellent 
qualities with a small amount of information and a small amount 
of computations . 

25 The inventors of the present invention noticed that pulse 

positions are relatively near at a voiced sound segment of 
speech, while pulse positions are relatively far at segments 
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of unvoiced sound of speech and background noise, in the case 
of applying a pulse sequence to coding at low bit rates . In 
other words, the inventors noticed that energy-concentrated 
excitation samples , which are characteristics of human vocal 
5 cord wave, are needed in a voiced speech, and in this case, 
there is a tendency that a small number of pulses whose 
positions are near are selected, while an excitation having 
more random number characteristics is needed in a unvoiced 
speech and background noise, in this case, there is a tendency 

10 that a large number of energy- spread pulses are selected. 

Based on the foregoing consideration, the inventors 
found out that the perception is improved by identifying a 
speech as voiced sound segment, or unvoiced sound segment and 
background noise segment by recognizing a distance of pulse 

15 positions, and based on the identification result, applying 
respective pulse sequences appropriate for the voiced sound 
segment, and the unvoiced and background noise segments, to 
achieve the present invention. 

A feature of the present invention is to use a plurality 

20 of codebooks each having two subcodebook of which 
characteristics are different and add excitation vectors of 
each subcodebook to obtain excitation vectors . According to 
the algorithm, the characteristics as a small-number-pulse 
excitation appear in the case where pulse positions are near, 

25 which is caused by the relationships of positions of the 
excitation vectors with a small number of pulses, while the 
characteristics as a large -number -pulse excitation appear in 
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the case where pulse positions are far, which is suited to 
the characteristics of speech signals containing background 
noises . 

Accordingly, without using particular voiced/unvoiced 
5 speech judgement algorithm, it is possible to automatically 
select an excitation most suitable for the localized 
characteristics in input signals, effectively code any of 
voiced speeches, unvoiced speeches and background noises, and 
obtain synthesized speeches with excellent sound qualities 
10 with a small amount of information and a small amount of 
computations . 

Brief Description of Drawings 

FIG.l is a block diagram illustrating a configuration 
15 of a speech coding apparatus in a conventional CELP system; 

FIG. 2 is a block diagram illustrating a configuration 
of a radio communication apparatus having a speech coding 
apparatus and a speech decoding apparatus of the present 
invention ; 

20 FIG. 3 is a block diagram illustrating a configuration 

of a speech coding apparatus in a CELP system according to a 
first embodiment to a third embodiment of the present 
invention; 

FIG. 4 is a block diagram illustrating a configuration 
25 of a speech decoding apparatus in the CELP system according 
to a first embodiment to a third embodiment of the present 
invention; 
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FIG. 5 is a block diagram illustrating a stochastic 
codebook in a speech coding apparatus /speech decoding 
apparatus according to the first embodiment of the present 
invention; 

5 FIG . 6A and FIG . 6B are concept diagrams of sub- excitation 

vectors stored in subcodebooks in the stochastic codebook; 

FIG.s 7A to 7F are concept diagrams to explain a 
generation method of excitation sample; 

FIG. 8 is a block diagram illustrating a stochastic 
10 codebook in a speech coding apparatus /speech decoding 
apparatus according to the second embodiment of the present 
invention ; 

FIG. 9 is a block diagram illustrating a stochastic 
codebook in a speech coding apparatus /speech decoding 
15 apparatus according to the third embodiment of the present 
invention ; 

FIG.10A and FIG. 10B are concept diagrams of sub- 
excitation vectors stored in subcodebooks in the stochastic 
codebook; 

20 FIG.s 11A to 11F are concept diagrams to explain a 

generation method of excitation sample; and 

FIG. 12 is a diagram illustrating a schematic 
configuration of a data medium storing a program for the speech 
coding apparatus /speech decoding apparatus of the present 

25 invention. 

Best Mode for Carrying Out the Invention 
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Embodiments of the present invention will be described 
in detail with reference to accompanying drawings . 
( Fir s t embodiment ) 

FIG. 2 is a block diagram illustrating a configuration 
5 of a radio communication apparatus having a speech coding/ 
decoding apparatus according to the first embodiment to the 
third embodiment of the present invention. 

In this radio communication apparatus , at a transmitting 
side, a speech is converted into electric analogue signals at 

10 speech input device 21 such as a microphone and output to A/D 
converter 22. The analogue speech signals are converted into 
digital speech signals at A/D converter 22 and output to speech 
coding section 23. Speech coding section 23 executes speech 
coding processing on the digital speech signals and outputs 

15 the coded data to modulation/demodulation circuit 24. 
Modulation/demodulation circuit 24 executes digital 
modulation on the coded speech signals to output to radio 
transmission circuit 25. Radio transmission circuit 25 
executes the predetermined radio transmission processing on 

20 the modulated signals. The signals are transmitted via 
antenna 26. In addition, processor 31 executes the processing 
properly using data stored in RAM 25 and ROM 26. 

On the other hand, at a receiving side in the radio 
communication apparatus, received signals received at antenna 

25 26 are subjected to the predetermined radio reception 
processing at radio reception circuit 27 and output to 
modulation/demodulation circuit 24. 
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Modulation/demodulation circuit 24 executes demodulation 
processing on the received signals and outputs the demodulated 
signals to Speech decoding section 28. Speech decoding 
section 28 executes decoding processing on the demodulated 
signals to obtain digital decoded speech signals and output 
the digital decoded speech signals to D/A converter 29. D/A 
converter 29 converts the digital decoded speech signals 
output from speech decoding section 28 into analogue decoded 
speech signals to output to speech output device 30 such as 
a speaker. Finally, speech output device 30 converts electric 
analogue decoded speech signals into decoded speech to output. 

Speech coding section 23 and speech decoding section 28 
are operated by processor 31 such as DSP using codebooks stored 
in RAM 32 and ROM 33. The operation program is also stored 
in ROM 33. 

FIG. 3 is a block diagram illustrating a configuration 
of a speech coding apparatus in the CELP system according to 
the first embodiment to the third embodiment of the present 
invention. The speech coding apparatus is included in speech 
coding section 23 illustrated in FIG. 2. In addition, adaptive 
codebook 43 illustrated in FIG . 3 is stored in RAM 32 illustrated 
in FIG. 2, and stochastic codebook 44 illustrated in FIG. 3 is 
stored in ROM 33 illustrated in FIG. 2. 

In the speech coding apparatus (hereinafter, also 
referred to as coder) illustrated in FIG. 3, LPC analyzing 
section 42 executes autocorrelation analysis and LPC analysis 
on input speech data 41 to obtain the LPC. LPC analyzing 
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section 42 further codes the obtained LPC to obtain the LPC 
code. LPC analyzing section 42 furthermore decodes the 
obtained LPC code to obtain the decoded LPC. In the coding, 
it is generally executed to convert into parameters having good 
5 interoperation characteristics such as LSP (Linear Spectrum 
Pair) then code by VQ (Vector Quantization). 

Excitation generating section 4 5 fetches excitation 
samples stored in adaptive codebook 43 and stochastic codebook 
44 (respectively referred to as adaptive code vector (or 

10 adaptive excitation) and stochastic code vector (or stochastic 
excitation) ) and provides respective excitation samples to LPC 
synthesis section 46. The adaptive codebook is a codebook in 
which excitation signals previously synthesized are stored and 
an index represents which synthesized excitation is used among 

15 from excitations synthesized at different previous times , i.e., 
time lag . 

LPC synthesis section 46 executes filtering on two 
excitations obtained at excitation generating section 45 with 
the decoded LPC obtained at LPC analyzing section 42. 

20 Comparing section 47 analyzes the relation of two 

synthesized speeches obtained at LPC synthesis section 4 6 and 
the input speech, obtains an optimal value (optimal gain) for 
two synthesized speeches, adds each synthesized speech 
respectively subjected to power adjustment with the optimal 

25 gain to obtain a total synthesized speech, and executes a 
distance calculation between the total synthesized speech and 
the input speech. Comparing section 47 further executes, with 
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respect to all excitation samples in adaptive codebook 43 and 
stochastic codebook 44, the distance calculations between the 
input speech and each of other many synthesized speeches 
obtained by functioning excitation generating section 45 and 
LPC analyzing section 46 , and obtains an index of the excitation 
sample whose distance is the smallest among the obtained 
distances. Then, comparing section 47 provides the obtained 
optimal gain, indices of excitation samples of respective 
codebooks and two excitation samples corresponding to 
respective index to parameter coding section 48. 

Parameter coding section 48 executes coding on the 
optimal gain to obtain the gain code and provides the gain code, 
the LPC code and the indices of excitation samples to 
transmission path 49. Further, parameter coding section 48 
generates an actual excitation signal (synthesized 
excitation) using the gain code and two excitations 
corresponding to the index and stores the excitation signal 
in adaptive codebook 43 while deleting old excitation samples. 

In addition, it is general for the synthesis at LPC 
synthesis section 4 6 to use together Linear Prediction 
Coefficients and a high-frequency enhancement filter or a 
perceptual weighting filter with long-term prediction 
coefficients (which are obtained by the long-term prediction 
analysis of input speech) . It is further general to execute 
the excitation search on the adaptive codebook and stochastic 
codebook at an interval (called subframe) obtained by further 
dividing an analysis interval. 
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FIG. 4 is a block diagram illustrating a configuration 
of a speech decoding apparatus in the CELP system according 
to the first embodiment to the third embodiment of the present 
invention. The speech decoding apparatus is included in 
speech decoding section 28 illustrated in FIG. 2 . In addition, 
adaptive codebook 53 illustrated in FIG. 4 is stored in RAM 32 
illustrated in FIG. 2, and stochastic codebook 54 illustrated 
in FIG. 4 is stored in ROM 33 illustrated in FIG. 2. 

In the speech decoding apparatus illustrated in FIG. 4, 
parameter decoding section 52 obtains coded speech signals 
from transmission path 51, while obtains respective coded 
excitation samples of excitation codebooks (adaptive codebook 
53 and stochastic codebook 54) , the coded LPC and coded gain . 
Parameter decoding section 52 then obtains the decoded LPC 
using the coded LPC and the decoded gain using the coded gain . 

Excitation generating section 55 multiplies each 
excitation sample respectively by the decoded gain to obtain 
decoded excitation signals. At this stage, excitation 
generating section 55 stores the obtained decoded excitation 
signals in adaptive codebook 53 as excitation samples, while 
deletes old excitation samples. LPC synthesis section 56 
executes filtering on the decoded excitation signals with the 
decoded LPC to obtain a synthesized speech. 

In addition, these two excitation codebooks are the same 
as those included in the speech coding apparatus illustrated 
in FIG. 3 (reference numerals 43 and 44 in FIG. 3). Sample 
numbers to fetch excitation samples (codes to adaptive 
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codebook and codes to stochastic codebook) are both supplied 
from parameter decoding section 52 (which corresponds to the 
short dashes line in FIG. 5 (control from comparing section 47) 
described later). 

The following description is given to explain in detail 
about functions of stochastic codebooks 44 and 54 for storing 
excitation samples, in the speech coding apparatus and speech 
decoding apparatus with the above configurations, using FIG. 5 
FIG .5 is a block diagram illustrating a stochastic codebook 
in the speech coding apparatus and speech decoding apparatus 
according to the first embodiment of the present invention. 

The stochastic codebook has first codebook 61 and second 
codebook 62, and first codebook 61 and second codebook 62 
respectively have two subcodebooks 61a, 61b and 62a, 62b. The 
stochastic codebook further has gain calculating section 63 
which calculates a gain for outputs from subcodebooks 61b and 
62b using pulse positions in subcodebooks 61a and 62a. 

Subcodebooks 61a and 62a are mainly used in the case where 
a speech is a voiced sound (pulse positions are relatively near ) , 
and formed by storing a plurality of sub-excitation vectors 
composed of a single pulse. Subcodebook 61b and 62b are mainly 
used in the case where a speech is an unvoiced sound or 
background noise (pulse positions are relatively far) , and 
formed by storing a plurality of sub-excitation vectors 
composed of a sequence with a plurality of pulses in which power 
is spread. The excitation samples are generated in the 
stochastic codebooks formed as described above. In addition. 
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the near and far pulse positions will be described later. 

In addition, subcodebooks 61a and 62b are formed by a 
method of arranging pulses algebraically, and subcodebooks 61b 
and 62b are formed by another method of dividing a vector length 
5 (subframe length) into some segment intervals and making a 
configuration so that a single pulse is always present at every 
segment interval (pulses are spread over a whole length) . 

These codebooks are formed in advance. In this 
embodiment, as illustrated in FIG. 5, the number of codebooks 
10 is set at two and each codebook has two subcodebooks* 

FIG.6A illustrates sub-excitation vectors stored in 
subcodebook 61a of first codebook 61. FIG.6B illustrates 
sub-excitation vectors stored in subcodebook 61b of first 
codebook 61. Similarly, subcodebooks 62a and 62b of second 
15 codebook 62 respectively have sub-excitation vectors 
illustrated in FIG.6A and FIG.6B. 

In addition, positions and polarities of pluses of 
sub-excitation vectors in subcodebooks 61b and 62b are formed 
using random numbers. According to the configuration 
20 described above, it is possible to form sub-excitation vectors 
in which power is uniformly spread over a whole vector length 
even though some fluctuations are present . FIG. 6B illustrates 
an example in the case where the number of segment intervals 
is four. In addition, in these two subcodebooks, respective 
25 sub-excitation vectors of the same index (number) are used at 
the same time ♦ 

The next description is given to explain speech coding 
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using the stochastic codebooks with the above-mentioned 
configuration . 

Gain calculating section 63 calculates an excitation 
vector number (index) according to the code from comparing 
section 47 in the speech coding apparatus. The code provided 
from comparing section 47 corresponds to the excitation vector 
number, and therefore the excitation vector number is 
determined by the code. Gain calculating section 63 fetches 
sub-excitation vectors with a small number of pulses 
corresponding to the determined excitation vector number from 
subcodebooks 61a and 62a. Gain calculating section 63 further 
calculates an addition gain using pulse positions of the 
fetched sub- excitation vectors. The addition gain 

calculation is given by the following equation (1); 

g= | P1-P2 | /L . . .equation (1) 

where g is an addition gain, PI and P2 are respectively pulse 
positions in codebooks 61a and 62a, and L is a vector length 
(subframe length) . Further, | | represents an absolute value. 

According to the above equation ( 1 ) , the addition gain 
is smaller as the pulse positions are nearer (the pulse distance 
is shorter) , while larger as pulse positions are further, and 
has a lower limit of 0 and an upper limit of 1. Accordingly, 
as the pulse positions are nearer, the gain for subcodebooks 
61b and 62b is relatively smaller. As a result, an affect 
of subcodebooks 61a and 62b corresponding to voiced speech is 
larger. On the other hand, as the pulse positions are further 
(the pulse distance is longer) , the gain for subcodebooks 61b 
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and 62b is relatively larger. As a result, an affect of 
subcodebooks 61b and 62b corresponding to unvoiced speech and 
background noise is relatively larger . Perceptually fine 
sounds are obtained by performing the gain control described 
5 above . 

Next, gain calculating section 63 refers to the number 
of excitation vector provided from comparing section 47 and 
obtains two sub-excitation vectors from subcodebooks 61b and 
62b with a large number of pulses. These two sub-excitation 

10 vectors from subcodebooks 61b and 62b are respectively 
provided to gain calculating sections 64 and 65 to be multiplied 
by the addition gain obtained at gain calculating section 63. 

Further, excitation vector addition section 66 obtains 
a sub-excitation vector from subcodebook 61a with a small 

15 number of pulses by referring to the number of excitation vector 
provided from comparing section 47, and also obtains the 
sub-excitation vector, from subcodebook 61b, multiplied by the 
addition gain obtained at gain calculating section 63. 
Excitation vector addition section 66 then adds the obtained 

20 sub- excitation vectors to obtain an excitation vector. 
Similarly, excitation vector addition section 6 7 obtains a 
sub-excitation vector from subcodebook 62a with a small number 
of pulses by referring to the number of excitation vector 
provided from comparing section 47, and also obtains the 

25 sub-excitation vector, from subcodebook 62b, multiplied by the 
addition gain obtained at gain calculating section 63. 
Excitation vector addition section 67 then adds the obtained 
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sub-excitation vectors to obtain an excitation vector. 

The excitation vectors respectively obtained by adding 
the sub-excitation vector are provided to excitation vector 
addition section 68 to be added. According to the foregoing 
5 processing, an excitation sample (stochastic code vector) is 
obtained. The excitation sample is provided to excitation 
generating section 45 and parameter coding section 48. 

On the other hand, a decoding side prepares the same 
adaptive codebook and stochastic codebook as those in the coder 

10 in advance, and based on respective index, LPC code, and gain 
code of each codebook transmitted from the transmission path, 
multiplies respective excitation sample by the gain to add. 
Then the decoding side executes filtering on the added sample 
with the decoded LPC to decode the speech. 

15 An example of excitation samples selected by the 

above-mentioned algorithm will be described next using FIG.7A 
to FIG.7F. Assume that an index of first codebook 61 is j, 
and an index of second codebook 62 is m or n. 

As been understood from FIG.7A and FIG.7B, in the case 

20 of j+m, since the pulse positions of the sub-excitation vectors 
of subcodebooks 61a and 6 2a are relatively near, a small value 
of the addition gain is calculated using the equation (1) 
described previously. Accordingly, the addition gain for 
subcodebooks 61b and 62b is small. Because of it, as 

25 illustrated in FIG.7C , excitation vector addition section 68 
obtains an excitation sample composed of a small number of 
pulses which reflects the characteristics of subcodebooks 61a 
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and 62a respectively illustrated in FIG.7A and FIG.7B. This 
excitation sample is effective on voiced speech. 

Further, as been understood from FIG.7A and FIG.7B, in 
the case of j+n, since the pulse positions of the sub-excitation 
5 vectors of subcodebooks 61a and 62a are relatively far, a large 
value of the addition gain is calculated using the equation 
(1) described previously. Accordingly, the addition gain for 
subcodebooks 61b and 62b is large. Because of it, as 
illustrated in FIG.7F, excitation vector addition section 68 

10 obtains an excitation sample with strong random 
characteristics with spread energy which reflects the 
characteristics of subcodebooks 61b and 62b respectively 
illustrated in FIG.7D and FIG.7E. This excitation sample is 
effective on unvoiced speech /background noise. 

15 This embodiment describes about the case of using two 

codebooks (two channels). However, it is also preferable to 
apply the present invention to the case of using codebooks equal 
to or more than three (channels equal to or more than three) . 
In this case, as a numerator of the equation in gain calculating 

20 section 63, i.e. , equation (1), the minimum value among from 
intervals between two pulses or the averaged value of all pulse 
intervals is used. For example, in the case where the number 
of codebooks is three and the minimum pulse interval is used 
as a numerator of the equation ( 1 ) , the calculation equation 

25 is given by the following equation (2); 

g=min( I P1-P2 | , I P2-P3 | , | P3-P1 | )/L ... equation (2) 

where g is an addition gain, PI, P2 and P3 are respective pulse 
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positions in those three codebooks , and Lisa vector length 
(subframe length). In addition, I I represents an absolute 
value . 

As described above , according to this embodiment , a 
5 plurality of codebooks have two subcodebooks each having 
respective sub- excitation vectors of which characteristics 
are different, and the excitation vector is obtained by adding 
each sub-excitation vector, thereby making it possible to 
correspond to input signals with various characteristics. 

10 In addition, since the gain to be multiplied by the 

sub -excitation vector is varied corresponding to the 
characteristics of the sub-excitation vectors, it is possible 
to reflect both characteristics of excitation vectors stored 
in two codebooks in the speech by a gain adjustment, thereby 

15 making it possible to effectively execute coding and decoding 
most suitable for the characteristics of the input signals with 
various characteristics . 

Specifically, since one of two subcodebooks stores a 
plurality of sub-excitation vectors composed of a small number 

20 of pulses, and another subcodebook stores a plurality of 
sub-excitation vectors composed of a large number of pulses, 
it is possible to achieve fine sound qualities in voiced speech 
by the excitation sample with the characteristics of a small 
number of pulses, and perform excitation generation most 

25 suitable to the characteristics of input signals with various 
characteristics . 

In addition, since gain calculating section calculates 
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a gain using a distance of pulse positions of sub-excitation 
vectors composed of a small number of pulses, it is possible 
to achieve synthesized speeches with fine sound qualities in 
voiced speech by the small number of pulses of which distance 
5 is near, while achieve perceptually fine synthesized speeches 
in unvoiced speech/background noise by the large number of 
pulses with spread energy. 

In the addition gain calculation described above, the 
processing is simplified by using a fixed value which is 

10 predetermined as an addition gain. In this case, it is not 
necessary to install gain calculating section 63 . Even in this 
case, it is possible to achieve synthesized speeches matching 
the needs timely by varying the setting of the fixed value 
properly. For example, it is possible to achieve coding 

15 excellent for plosive speech such as low voice like male voice 
by setting the addition gain on a small scale, while to achieve 
coding excellent for random speeches such as background noise 
by setting the addition gain on a large scale. 

In addition, it is also preferable to apply a method of 

20 calculating an addition gain adaptively using a level of input 
signal power, decoded LPC or adaptive codebook, besides the 
method of calculating the addition gain using pulse positions 
and another method of providing fixed coefficients to the 
addition gain. For example, it may be possible to achieve 

25 excellent coding adaptive for localized speech 

characteristics by preparing a function for determining voiced 
speech characteristics (such as vowel and standing wave) or 
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unvoiced speech characteristics (such as background noise and 
unvoiced consonant) and setting a small gain in the case of 
voiced speech characteristics, while setting a large gain in 
the case of unvoiced speech characteristics . 
5 (Second embodiment) 

This embodiment will describes about the case where a 
gain calculating section obtains decoded LPC from LPC 
analyzing section 42 and perf orms a voiced/unvoiced judgement 
using the obtained LPC. 

10 FIG. 8 is a block diagram illustrating a stochastic 

codebook in the speech coding apparatus /speech decoding 
apparatus according to the second embodiment of the present 
invention. The configurations of the speech coding apparatus 
and the speech decoding apparatus with the stochastic code book 

15 are the same as the first embodiment (FIG. 3 and FIG.4). 

The stochastic codebook has first codebook 71 and second 
codebook 72 , and first codebook 71 and second codebook 72 
respectively have two subcodebooks 71a, 71b and subcodebooks 
72a, 72b. The stochastic codebook further has gain 

20 calculating section 73 which calculates a gain for outputs from 
subcodebooks 71b and 72b using pulse positions in subcodebooks 
71a and 72a. 

Subcodebooks 71a and 72a are mainly used in the case where 
a speech is a voiced sound (pulse positions are relatively near) , 
25 and formed by storing a plurality of sub-excitation vectors 
composed of a single pulse. Subcodebook 71b and 72b are mainly 
used in the case where a speech is an unvoiced sound or 
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background noise (pulse positions are relatively far), and 
formed by storing a plurality of sub- excitation vectors 
composed of a sequence with a plurality of pulses in which power 
is spread. The excitation samples are generated in the 
5 stochastic codebooks formed as described above. 

In addition, subcodebooks 71a and 72a are formed by a 
method of arranging pulses algebraically, and subcodebooks 71b 
and 72b are formed by another method of dividing a vector length 
(subframe length) into some segment intervals and making a 

10 configuration so that a single pulse is always present at every 
segment interval (pulses are spread over a whole length) . 

These codebooks are formed in advance. In this 
embodiment, as illustrated in FIG. 8, the number of codebooks 
is set at two and each codebook has two subcodebooks . The 

15 number of codebooks and the number of subcodebooks are not 
limited. 

FIG.6A illustrates sub-excitation vectors stored in 
subcodebook 71a of first codebook 71. FIG.6B illustrates 
sub-excitation vectors stored in subcodebook 71b of first 

20 codebook 71. Similarly, subcodebooks 72a and 72b of second 
codebook 72 respectively have sub -excitation vectors 
illustrated in FIG.6A and FIG.6B. 

In addition, positions and polarities of pluses of 
sub- excitation vectors in subcodebooks 71b and 72b are formed 

25 using random numbers. According to the configuration 
described above, it is possible to form sub- excitation vectors 
in which power is uniformly spread over a whole vector length 
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even though some fluctuations are present . FIG. 6B illustrates 
an example in the case where the number of segment intervals 
is four. In addition, in these two subcodebooks , respective 
sub-excitation vectors of the same index (number) are used at 
5 the same time. 

The next description is given to explain speech coding 
using the stochastic codebooks with the above-mentioned 
configuration . 

Gain calculating section 73 obtains decoded LPC from LPC 
10 analyzing section 42 and performs a voiced/unvoiced judgement 
using the obtained LPC. Specifically, gain calculating 
section 73 beforehand collects data corresponding to LPC, for 
example, obtained by converting the LPC into impulse response 
or LPC cepstrum, with respect to a lot of speech data, by 
15 relating to every mode, for example, voiced speech, unvoiced 
speech and background noise. Then the data are subjected to 
statistic processing and based on the result, a rule of judging 
voiced, unvoiced and background noise is generated. As an 
example of the rule, it is general to use linear determination 
20 function and Bayes judgment. Then, based on the judgment 
result obtained according to the rule, weighting coefficient 
R is obtained by a regulation of the following equation (3); 
R=L :when judged as voiced speech 

R=LXo.5 :when judged as unvoiced speech/ 
25 background noise ...equation (3) 

where R is a weighting coefficient, and L is a vector length 
(subframe length). 
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Gain calculating section 73 next receives an instruction 
of the number of excitation vector (index number) from 
comparing section 47 in the speech coding apparatus, and 
according to the instruction, fetches sub-excitation vectors 
of the designated number respectively from subcodebooks 71a 
and 72a with a small number of pulses. Gain calculating 
section 73 calculates an addition gain using pulse positions 
of the fetched sub-excitation vectors. The calculation of the 
addition gain is executed according to the following equation 



where g is an addition gain, PI and P2 are respectively pulse 
positions in codebooks 71a and 7 2a, and R is a weighting 



According to the above equations (3) and (4), the 
addition gain is smaller as the pulse positions are nearer, 
while larger as pulse positions are further, and has a lower 
limit of 0 and an upper limit of L/R. Accordingly, as the pulse 
positions are nearer, the gain for subcodebooks 71b and 72b 
is relatively smaller. As a result , an affect of subcodebooks 
71a and 72a corresponding to voiced speech is larger. On the 
other hand, as the pulse positions are further, the gain for 
subcodebooks 71b and 72b is relatively larger. As a result, 
an affect of subcodebooks 71b and 72b corresponding to unvoiced 
speech and background noise is larger. Perceptually fine 
sounds are obtained by performing the gain calculation 
described above. 



(4); 



g= | P1-P2 I /R 



. . . equation ( 4 ) 



coefficient. Further, I I represents an absolute value. 
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Further, excitation vector addition section 76 obtains 
a sub-excitation vector from subcodebook 61a with a small 
number of pulses by referring to the number of excitation vector 
provided from comparing section 47, and also obtains a 
5 sub-excitation vector, from subcodebook 71b, multiplied by the 
addition gain obtained at gain calculating section 73, 
Excitation vector addition section 7 6 then adds the obtained 
sub-excitation vectors to obtain an excitation vector. 
Similarly, excitation vector addition section 77 obtains a 

10 sub-excitation vector from subcodebook 72a with a small number 
of pulses by referring to the number of excitation vector 
provided from comparing section 47, and also obtains a 
sub-excitation vector, from subcodebook 72b, multiplied by the 
addition gain obtained at gain calculating section 73. 

15 Excitation vector addition section 77 then adds the obtained 
sub-excitation vectors to obtain an excitation vector. 

The excitation vectors respectively obtained by adding 
the sub- excitation vector are provided to excitation vector 
addition section 68 to be added. According to the foregoing 

20 processing, an excitation sample (stochastic code vector) is 
obtained. The excitation sample is provided to excitation 
generating section 45 and parameter coding section 48. 

On the other hand, a decoding side prepares the same 
adaptive codebook and stochastic codebook as those in the coder 

25 in advance, and based on respective index, LPC code, and gain 
code of each codebook transmitted from the transmission path, 
multiplies respective excitation sample by the gain to add. 
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Then the decoding side executes filtering on the added sample 
with the decoded LPC to decode the speech. 

At this stage, it is necessary to provide the decoded 
LPC to the stochastic codebook in this embodiment, which 
5 differs from the first embodiment. Specifically, at this 
stage, parameter decoding section 52 provides the obtained LPC 
along with the sample number for the stochastic codebook to 
the stochastic codebook (which corresponds to that the signal 
line from parameter decoding section 52 to stochastic codebook 
10 54 in FIG . 4 includes the signal line from "LPC analyzing section 
4 2" and the control line indicative of "control from comparing 
section 47" ) . 

The excitation samples selected by the above algorithm 
are the same as the first embodiment and illustrated in FIG.7A 

15 to FIG.7F. 

As described above, according to this embodiment, gain 
calculating section 73 performs the voiced/unvoiced 
judgement using the decoded LPC, and calculates the addition 
gain using weighting coefficient R obtained according to 

20 equation (3) , resulting in a small gain at the time of voiced 
speech and a large gain at the time of unvoiced speech and 
background noise . The obtained excitation samples are thereby 
a smaller number of pulses in voiced speech and a large number 
of pulses containing more noises in unvoiced speech and 

25 background noise. Accordingly, it is possible to further 
improve the effect by adaptive pulse positions described above, 
thereby enabling synthesized speech with more excellent sound 
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qualities to be achieved. 

In addition , the speech coding in this embodiment also 
has the effect on transmission error. In the coding with a 
conventional voiced/unvoiced judgment, stochastic codebooks 
5 are switched generally by LPC. Because of it, when a 
transmission error introduces a wrong judgment, the decoding 
is sometimes executed with absolutely different excitation 
samples, resulting in a low transmission error resistance. 



10 if wrong LPC are used in the voiced/unvoiced judgment in 
decoding, only a value of addition gain varies a little, and 
the deterioration caused by the transmission error is little. 
Hence, according to this embodiment, it is possible to obtain 
synthesized speeches with excellent sound qualities without 

15 being affected by the transmission error of LPC code largely, 
while executing the adaptation by LPC. 

This embodiment describes about the case of using two 
codebooks (two channels). However, it is also preferable to 
apply the present invention to the case of using codebooks equal 

20 to or more than three (channels equal to or more than three) . 
In this case, as a numerator of the equation in gain calculating 
section 63, i.e., equation (4), the minimum value among from 
intervals between two pulses or the averaged value of all pulse 
intervals is used. 

25 The first and second embodiments describe about the case 

of adjusting gains for outputs from subcodebooks 61b, 62b, 71b 
and 72b. However, it is also preferable to adjust outputs from 



On the contrary, in the speech coding in this embodiment, 
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subcodebooks 61a, 62a, 71a and 72a or to adjust outputs from 
all subcodebooks, under the condition that a gain for outputs 
from subcodebooks is adjusted so that an affect by excitation 
vectors with a small number of pulses is large when pulse 
5 positions are near, while an affect by excitation vectors with 
a large number of pulses is large when pulse positions are far. 
(Third embodiment) 

This embodiment will describe about the case of switching 
an excitation vector to acquire from a subcodebook 

10 corresponding to a distance of pulse intervals . 

FIG. 9 is a block diagram illustrating a stochastic 
codebook in the speech coding apparatus /speech decoding 
apparatus according to the third embodiment of the present 
invention. The configurations of the speech coding apparatus 

15 and the speech decoding apparatus with the stochastic code book 
are the same as the first embodiment (FIG. 3 and FIG. 4). 

The stochastic codebook has first codebook 91 and second 
codebook 92, and first codebook 91 and second codebook 92 
respectively have two subcodebooks 91a, 91b and subcodebooks 

20 92a, 92b. The stochastic codebook further has excitation 
switching instructing section 93 which executes switching 
between outputs from subcodebooks 91b and 9 2b corresponding 
to a pulse position in subcodebooks 91a and 92a. 



25 a speech is a voiced sound ( pulse positions are relatively near ) , 
and formed by storing a plurality of sub-excitation vectors 
composed of a single pulse. Subcodebook 91b and 92b are mainly 



Subcodebooks 91a and 92a are mainly used in the case where 
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used in the case where a speech is an unvoiced sound or 
background noise (pulse positions are relatively far) , and 
formed by storing a plurality of sub-excitation vectors 
composed of a sequence with a plurality of pulses in which power 
5 is spread. The excitation samples are generated in the 
stochastic codebooks formed as described above. 

In addition, subcodebooks 91a and 92a are formed by a 
method of arranging pulses algebraically, and subcodebooks 91b 
and 9 2b are formed by another method of dividing a vector length 

10 (subframe length) into some segment intervals and making a 
configuration so that a single pulse is always present at every 
segment interval (pulses are spread over a whole length) . 

These codebooks are formed in advance. In this 
embodiment, as illustrated in FIG. 9, the number of codebooks 

15 is set at two and each codebook has two subcodebooks. The 
number of codebooks and the number of subcodebooks are not 
limited. 

FIG.10A illustrates sub-excitation vectors stored in 
subcodebook 91a of first codebook 91. FIG.10B illustrates 

20 sub-excitation vectors stored in subcodebook 91b of first 
codebook 91. Similarly, subcodebooks 92a and 92b of second 
codebook 92 respectively have sub-excitation vectors 
illustrated in FIG.10A and FIG.10B. 

In addition, positions and polarities of pluses of 

25 sub-excitation vectors in subcodebooks 91b and 92b are formed 
using random numbers . According to the configuration 
described above, it is possible to form sub-excitation vectors 
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in which power is uniformly spread over a whole vector length 
even though some fluctuations are present. FIG.10B 
illustrates an example in the case where the number of segment 
intervals is four. In addition, in these two subcodebooks , 
5 respective sub- excitation vectors of the same index (number) 
are not used at the same time. 

The next description is given to explain speech coding 
using the stochastic codebooks with the above-mentioned 
configuration. 

10 Excitation switching instructing section 93 calculates 

the excitation vector number (index) according to a code from 
comparing section 47 in the speech coding section. The code 
provided from comparing section 47 corresponds to the 
excitation vector number, and therefore the excitation vector 

15 number is determined by the code. Excitation switching 
instructing section 93 fetches sub-excitation vectors with a 
small number of pulses corresponding to the determined 
excitation vector number from subcodebooks 91a and 92a. 
Further, excitation switching instructing section 9 3 executes 

20 a judgment described as below, using pulse positions of the 
fetched sub-excitation vectors; 

I P1-P2 | <Q : using subcodebooks 91a and 92a 

I P1-P2 | : using subcodebooks 91b and 92b, 

where PI and P2 are respectively pulse positions in 

25 subcodebooks 91a and 92a, Q is a constant and I I represents 
an absolute value. 

In the above judgment, excitation vectors with a small 
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number of pulses are selected when pulse positions are near, 
while excitation vectors with a large number of pulses are 
selected when pulse positions are far. Performing the 
judgment and selection as described above enables perceptually 
5 fine sounds to be achieved. The constant Q is predetermined. 
It is possible to vary the ratio of the excitation with a small 
number of pulses and the excitation with a large number of 
pulses by varying the constant Q. 

Excitation switching instructing section 9 3 fetches 

10 excitation vectors from subcodebooks 91a and 92a or 
subcodebooks 91b and 92b in codebooks 91 or 92 according to 
the switching information (switching signal) and the code of 
excitation (sample number). The switching is executed at 
first and second switches 94 and 95. 

15 The obtained excitation vectors are provided to 

excitation vector addition section 96 to be added. The 
excitation sample (stochastic code vector) is thus obtained. 
The excitation sample is provided to excitation generating 
section 45 and parameter coding section 48. In addition, at 

20 a decoding side, the excitation sample is provided to 
excitation generating section 55. 

An example of excitation samples selected by the 
above-mentioned algorithm will be described next using FIG. 11A 
to FIG.11F. Assume that an index of first codebook 91 is j, 

25 and an index of second codebook 92 is m or n. 

As been understood from FIG.11A and FIG.11B, in the case 
if j+m, since the pulse positions of the sub-excitation vectors 
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of subcodebooks 91a and 9 2a are relatively near, excitation 
switching instructing section 93 selects sub-excitation 
vectors with a small number of pulses according to the above 
judgment. Then, excitation vector addition section 96 adds 
two sub-excitation vectors selected respectively from 
subcodebooks 91a and 92a illustrated in FIG.11A and FIG.11B. 
and obtains an excitation sample with strong pulse 
characteristics as illustrated in FIG.11C. This excitation 
sample is effective on voiced speech. 

Further, as been understood from FIG.11A and FIG. 1 IB, 
in the case of j+n, since the pulse positions of the sub- 
excitation vectors of subcodebooks 91a and 9 2a are relatively 
far, excitation switching instructing section 93 selects 
sub-excitation vectors with a large number of pulses according 
to the above judgment. Then, excitation vector addition 
section 96 adds two sub-excitation vectors selected 
respectively from subcodebooks 91b and 92b illustrated in 
FIG.11D and FIG. HE. and obtains an excitation sample with 
strong random characteristics with spread energy as 
illustrated in FIG.11F. This excitation sample is effective 
on unvoiced speech/background noise. 

As described above, according to this embodiment, an 
excitation sample is generated by switching excitation vectors 
in two subcodebooks which a plurality of codebooks each have 
to obtain, and using excitation vectors obtained from either 
of subcodebooks in each codebook. It is thus possible to 
correspond to input signals with various characteristics by 
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a fewer amount of computations. 

Since one of two subcodebooks stores a plurality of 
excitation vectors with a small number of pulses while another 
one stores a plurality of excitation vectors with a large number 
5 of pulses in which power is spread, it is possible to use the 
excitation sample with a small number of pulses for voiced 
speech while use another excitation sample with a large number 
of pluses for unvoiced speech/background noise. It is thereby 
possible to obtain synthesized speeches with excellent sound 

10 qualities, and also to obtain excellent performances for input 
signals with various properties. 

Further, since the excitation switching instructing 
section switches excitation vectors to acquire from a 
subcodebook corresponding to a distance between pulse 

15 positions, it is possible to achieve synthesized speeches 
with fine sound qualities in voiced speech by a small number 
of pulses of which distances are near, wile achieve 
perceptually fine synthesized speeches in unvoiced speech and 
background noise by a large number of pulses in which power 

20 is spread. Furthermore, since the excitation switching 
instructing section acquires excitation vectors from a 
subcodebook while switching, for example, it is not necessary 
to calculate a gain and multiple the gain by a vector in an 
stochastic codebook. Accordingly, in the speech coding 

25 according to this embodiment, a computation amount is much less 
than the case of calculating the gain. 

That is , since the above-mentioned switching is executed 
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based on a relative distance between pulse positions of 
sub-excitation vectors composed of a small number of pulses , 
it is possible to achieve fine synthesized speeches in voiced 
speech by excitation samples with a small number of pulses of 
5 which distance are near, while achieve perceptually fine 
synthesized speeches in unvoiced speech/background noise by 
excitation samples with a large number of pulses with spread 
power . 

This embodiment describes about the case of using two 

10 codebooks (two channels). However, it is also preferable to 
apply the present invention to the case of using codebooks equal 
to or more than three (channels equal to or more than three) . 
In this case, as a judgment basis in excitation switching 
instructing section 93, the minimum value among from 

15 intervals between two pulses or the averaged value of all pulse 
intervals is used. For example, in the case of using three 
codebooks and the minimum value among from intervals between 
two pulses, the judgment basis is as follows; 
min( I P1-P2 | , | P2-P3 | , I P3-P1 I )<Q: using subcodebooks a 

20 min( | P1-P2 | , I P2-P3 | , I P3-P1 | )^Q: using subcodebooks b 
where PI, P2 and P3 are respectively pulse positions in 
respective codebooks, Q is a weighting coefficient, and | 
I represents an absolute value. 

In the speech coding/ decoding according to this 

25 embodiment, it may be possible to combine voiced/unvoiced 
judgment algorithm in the same way as the second embodiment. 
In other words, at a coding side, the excitation switching 



WO 99/65017 



PCT/JP99/03064 



instructing section obtains decoded LPC from the LPC analyzing 
section and executes the voiced/unvoiced judgment using the 
LPC, and at a decoding side, the decoded LPC is provided to 
the stochastic codebook. According to the aforementioned 
5 processing, it is possible to improve the effect by adapted 
pulse positions and achieve synthesized speeches with more 
excellent sound qualities . 

The above constitution is achieved by providing 
voiced/unvoiced judgment sections separately at a coding side 

10 and a decoding side and corresponding to the judgment result, 
making Q variable as a threshold value for the judgment of 
excitation switching instructing section. In this case, Q is 
set at a large scale in the case of voiced speech while Q is 
set at a low scale in the case of unvoiced speech in order to 

15 enable varying the ratio of the number of excitations with a 
small number of pulses and the number of excitations with a 
large number of pulses corresponding to localized 
characteristics of speeches . 

In addition, in the case where the voiced/unvoiced 

20 judgment is executed by backward (using other decoded 
parameters without transmitting as code), there is a 
possibility that a wrong judgment occurs by transmission error. 
According to the coding/decoding in this embodiment, since 
the voiced/unvoiced judgment is executed only by varying 

25 threshold Q, a wrong judgment affects only a difference of 
threshold Q between in the cases of voiced speech and unvoiced 
speech. Accordingly, the affects caused by the wrong judgment 
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is very small . 

In addition, it may be possible to use a level of input 
signal power, decoded LPC and a method of calculating Q 
adaptively using an adaptive codebook. For example, prepare 
in advance a function for determining voiced characteristics 
(such as vowel and standing wave) or unvoiced characteristics 
(such as background noise and unvoiced consonant) using the 
above parameters, and set Q at a large scale at the time of 
the voiced characteristics, while set Q at a low scale at the 
time of the unvoiced characteristics. According to the 
aforementioned processing, it is possible to use an excitation 
sample composed of a small number of pulses in a voiced 
characteristics interval and another excitation sample 
composed of a large number of pulses in a unvoiced 
characteristics interval, thereby making it possible to obtain 
excellent coding performance adaptive for speech localized 
characteristics . 

In addition, the speech coding/decoding according to the 
first to third embodiments are described as speech coding 
apparatus /speech decoding apparatus, however it may be 
possible to construct the speech coding/decoding as software. 
For example, it may be possible to store the program for the 
above -described speech coding/decoding in a ROM and operate 
by instructions of a CPU according to the program. Further, 
as illustrated in FIG. 12, it may be possible to store program 
101a, adaptive codebook 101b and algebraic codebook 101c in 
recording medium 101 which is readable by computer, write 
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program 101a of recording medium 101, adaptive codebook 101b 
and stochastic codebook 101c in a RAM of a computer and operate 
according to the program. These cases also achieve the same 
functions and effects as the first to third embodiments 
described above. 

The first to third embodiments describe the case where 
the number of pulses is one as an excitation vector with a small 
number of pulses , it may be possible to use an excitation vector 
in which the number of pulses is equal to or more than two as 
an excitation vector with a small number of pulses . In this 
case, it is preferable to apply an interval of pulses whose 
positions are the nearest among from a plurality of pulses as 
the near-far judgment of pulse positions. 

The first to third embodiments describe about the case 
of adapting the present invention to speech coding 
apparatus /speech decoding apparatus in the CELP system, 
however the present invention is applicable to any speech 
coding/ decoding using "codebook" because the feature of the 
present invention is in an stochastic codebook. For example, 
the present invention is applicable to "RPE-LPT" that is a 
standard full rate codec by GSM and "MP-MLQ" that is an 
international standard codec "G. 723.1" by ITU-T. 

This application is based on the Japanese Patent 
Applications No . HEI10- 160119 filed on June 9, 1998 and 
No.HEIlO-258271 filed on Sep. 11, 1998, entire contents of 
which are expressly incorporated by reference herein. 
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Industrial applicability 

The speech coding apparatus and speech decoding 
apparatus according to the present invention are applicable 
to portable telephones and digital communications using speech 
5 coding algorithm at low bit rates . 

This application is based on the Japanese Patent 
Applications No .HEI10-160119 filed on June 9, 1998 and 
No .HEI10-258271 filed on September 11, 1998, entire content 
of which is expressly incorporated by reference herein. 
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CLAIMS 

1 . An apparatus for performing speech coding in a CELP 
system, said apparatus comprising: 

an adaptive codebook in which previously synthesized 
5 excitation signals are stored; 

a stochastic codebook in which a plurality of excitation 
vectors are stored, said stochastic codebook having a first 
subcodebook in which excitation vectors composed of a small 
number of pulses are stored and a second subcodebook in which 
10 excitation vectors composed of a large number of pulses are 
stored; 

means for obtaining a synthesized speech using 
excitation information acquired from said adaptive codebook 
and said stochastic codebook, using LPC obtained by performing 
15 LPC analysis on an input speech signal; 

means for obtaining gain information for said 
synthesized speech using a relation of said synthesized speech 
and said input speech signal; and 

means for transmitting said LPC, said excitation 
20 information and said gain information, 

2 . The apparatus according to claim 1 , wherein said 
stochastic codebook further has control means for controlling 
a gain for respective excitation vectors in said first 
subcodebook and said second subcodebook corresponding to a 

25 distance between pulses of the excitation vectors in said first 
subcodebook, and computation means for obtaining the 
excitation information using the gain controlled excitation 
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vectors . 

3 . The apparatus according to claim 1 , wherein said 
control means makes the gain for the excitation vectors in said 
second subcodebook relatively small in a case where the 
5 distance between pulses of excitation vectors in said first 
subcodebook is short , while makes the gain for the excitation 
vectors in said second subcodebook relatively large in another 
case where the distance between pulses of excitation vectors 
in said first subcodebook is long. 
10 4. The apparatus according to claim 3, wherein said 

control means calculates the gain according to a following 
equation 1, 

g= I P1-P2 I /L ...equation (1) 

wherein g is the gain, PI and P2 are respectively excitation 

15 vector positions in first subcodebook , and Lisa vector length . 

5. The apparatus according to claim 1, said 
stochastic codebook further has instruction means for 
instructing an excitation vector to be acquired from said first 
subcodebook and said second subcodebook corresponding to a 

20 distance between excitation vectors in said first subcodebook, 
and switching means for switching between outputs of the 
excitation vectors in said first subcodebook and said second 
subcodebook according to the instruction by said instruction 
means . 

25 6. An apparatus for performing speech coding in a 

CELP system, said apparatus comprising: 

an adaptive codebook in which previously synthesized 
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excitation signals are stored; 

a stochastic codebook in which a plurality of excitation 
vectors are stored, said stochastic codebook having a first 
subcodebook in which excitation vectors composed of a small 
5 number of pulses are stored and a second subcodebook in which 
excitation vectors composed of a large number of pulses are 
stored; 

means for obtaining a synthesized speech using 
excitation information acquired from said adaptive codebook 
10 and said stochastic codebook, using LPC obtained by performing 
LPC analysis on an input speech signal; 

means for executing a voiced/unvoiced judgment on said 
input speech signal using said LPC; 

means for obtaining gain information for said 
15 synthesized speech using a relation of said synthesized speech 
and said input speech signal; and 

means for transmitting said LPC, said excitation 
information and said gain information. 

7 . The apparatus according to claim 6 , wherein said 
20 stochastic codebook further has control means for controlling 
a gain for respective excitation vectors in said first 
subcodebook and said second subcodebook corresponding to a 
distance between pulses of the excitation vector in said first 
subcodebook, and computation means for obtaining the 
25 excitation information using the gain controlled excitation 
vectors . 

8 - The apparatus according to claim 6 , wherein said 
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control means makes the gain for the excitation vector in said 
second subcodebook relatively small in a case where the 
distance between pulses of excitation vectors in said first 
subcodebook is short, while makes the gain for the excitation 
vector in said second subcodebook relatively large in another 
case where the distance between pulses of excitation vectors 
in said first subcodebook is long. 

9 . The apparatus according to claim 7 , wherein said 
control means calculates the gain according to a following 
equation 2, 

g= | P1-P2 | /R . . .equation (2) 

wherein g is the gain, PI and P2 are respectively excitation 
vector positions in said first subcodebook, and R represents 
a weighting coefficient and is a vector length L in a case where 
a result of the voiced/unvoiced judgment indicates a voiced 
speech, and LX 0.5 in another case where the result of the 
voiced/unvoiced judgment indicates an unvoiced speech. 

10. The apparatus according to claim 6, said 
stochastic codebook further has instruction means for 
instructing an excitation vector to be acquired from said first 
subcodebook and said second subcodebook corresponding to a 
distance between excitation vectors of said first subcodebook, 
and switching means for switching between outputs of the 
excitation vectors in said first subcodebook and said second 
subcodebook according to the instruction by said instruction 
means . 

11. An apparatus for performing speech decoding in a 
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CSLP system, said apparatus comprising: 

an adaptive codebook in which previously synthesized 
excitation signals are stored; 

a stochastic codebook in which a plurality of excitation 
5 vectors are stored, said stochastic codebook having a first 
subcodebook in which excitation vectors composed of a small 
number of pulses are stored and a second subcodebook in which 
excitation vectors composed of a large number of pulses are 
stored; 

10 means for receiving LPC , excitation information and gain 

information transmitted from a coding side; and 

means for decoding a speech using said excitation 
information multiplied by said gain information, and said LPC . 

12. The apparatus according to claim 11, wherein said 
15 apparatus further comprises means for providing said LPC to 

said stochastic codebook . 

13. A method for performing speech coding in a CELP 
system, said method comprising the steps of: 

controlling a gain for respective excitation vectors in 
20 a first subcodebook and a second subcodebook corresponding to 
a distant between pulses of excitation vectors in said first 
subcodebook of a stochastic codebook having said first 
subcodebook in which excitation vectors composed of a small 
number of pulses are stored and said second subcodebook in which 
25 excitation vectors composed of a large number of pulses are 
stored; 

obtaining excitation information using gain controlled 
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excitation vectors ; 

obtaining a synthesized speech using excitation 
information acquired from an adaptive codebook and said 
stochastic codebook, using LPC obtained by performing LPC 
5 analysis on an input speech signal; and 

obtaining gain information for said synthesized speech 
using a relation of said synthesized speech and said input 
speech signal. 

14, The method according to claim 13, wherein said 
10 method further comprises the step of performing a 

voiced/unvoiced judgment on said input speech signal using 
said LPC. 

15. A method for performing speech coding in a CELP 
system, said method comprising the steps of: 

15 selecting an excitation vector in either of a first 

subcodebook or a second subcodebook corresponding to a distant 
between pulses of excitation vectors in said first subcodebook 
of a stochastic codebook having said first subcodebook in which 
excitation vectors composed of a small number of pulses are 

20 stored and said second subcodebook in which excitation vectors 
composed of a large number of pulses are stored; 

obtaining excitation information using the selected 
excitation vector; 

obtaining a synthesized speech using excitation 

25 formation acquired from an adaptive codebook and said 
stochastic codebook, using LPC obtained by performing LPC 
analysis on an input speech signal; and 
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obtaining gain information for said synthesized speech 
using a relation of said synthesized speech and said input 
speech signal . 

16. The method according to claim 15, wherein said 
5 method further comprises the step of performing a 

voiced/unvoiced judgment on said input speech signal using 
said LPC. 

17. A recording medium readable by a computer, said 
medium storing a speech coding program , an adaptive codebook 

10 in which previously synthesized excitation signals are stored, 
and a stochastic codebook in which a plurality of excitation 
vectors are stored, said stochastic codebook having a first 
subcodebook in which excitation vectors composed of a small 
number of pulses are stored and a second subcodebook in which 
15 excitation vectors composed of a large number of pulses are 
stored, said program including the procedures of: 

controlling a gain for respective excitation vectors in 
said first subcodebook and said second subcodebook 
corresponding to a distant between pulses of excitation 
20 vectors in said first subcodebook of said stochastic codebook; 

obtaining excitation information using gain controlled 
excitation vectors; 

obtaining a synthesized speech using excitation 
information acquired from said adaptive codebook and said 
25 stochastic codebook, using LPC obtained by performing LPC 
analysis on an input speech signal; and 

obtaining gain information for said synthesized speech 
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using a relation of said synthesized speech and said input 
speech signal. 
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